ERROR BOUNDS FOR COMPUTING THE 
EXPECTATION BY MARKOV CHAIN MONTE CARLO 



DANIEL RUDOLF 

Abstract. We study the error of reversible Markov chain Monte 
Carlo methods for approximating the expectation of a function. 
Explicit error bounds with respect to the I2-, I a- and Zoo-norm of 
the function are proven. By the estimation the well known asymp- 
totical limit of the error is attained, i.e. there is no gap between 
the estimate and the asymptotical behavior. We discuss the depen- 
dence of the error on a burn-in of the Markov chain. Furthermore 
we suggest and justify a specific burn-in for optimizing the algo- 
rithm. 



1. Introduction 

We start with a probability distribution 7r on a finite set D and a 
function / : D — > R. The goal is to compute the expectation denoted 
by 

s(/) = X) /(*)*(*). 

Let the cardinality of D be very large such that an exact computa- 
tion of the sum is practically impossible. Furthermore suppose that 
the desired distribution is not explicitly given, i.e. we have no ran- 
dom number generator for 7r available. Such kind of problems arise in 
statistical physics, in statistics, and in financial mathematics (see for 
instance [GRS96, Liu08j). The idea of approximating S(f) via Markov 
chain Monte Carlo (MCMC) is the following: Run a Markov chain on 
D to simulate the distribution ir and compute the time average over 
the last n steps. Let Xi, . . . ,X n+no be the chain, then we obtain as 
approximation 
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By no the so called burn-in is given, loosely spoken this is the number 
of time steps taken to warm up. Afterwards the distribution of the 
generated Markov chain is (hopefully) close to the stationary one. 

A Markov chain is identified with its initial distribution v and its 
transition matrix P. We restrict ourself to ergodic chains, i.e. the sec- 
ond largest absolute value j3 of the eigenvalues of P is smaller than one. 
It is well known that the distribution of these chains reaches station- 
arity exponentially (see [Bre99l IRR971 ILPW09j ). 

The error of S n>no for / e M. D is measured by 

ev(S n ,no,f) = (E„,p \S n ,no{f) ~ S(f)\ 2 ) 1 , 

where E^p denotes the expectation of the Markov chain. The asymp- 
totic behavior of the integration error can be written in terms of the 
eigenvalues and eigenfunctions of P. It holds true that 

lim n ■ e v (S n>no ,f) 2 < 1 h " 



J n,noi J J — -, n II J 112 ' 



where (5\ is the second largest eigenvalue (see [Sok97t IMat99j ) . The con- 
stant is optimal but this statement does not give an error bound 
for finite n and also does not include anything concerning the choice of 
uq. How does an explicit error bound of the MCMC method look like 
where the asymptotic behavior is attained? 

Let us give an outline of the structure and the main results. Section [2] 
contains the used notation and presents some relevant statements con- 
cerning Markov chains. Section [3] contains the new results. The explicit 
error bound is developed with respect to the I2-, I4- and Zoo-norm of 



the function /. For WfW^ < 1 and C = 2y ||^ — l]^ we obtain the 
following. The error obeys 



n(l-/3i) n 2 (l-(3) 2 ' 

For details and estimates concerning I2 and I4 we refer to Theorem [TT] 

in Section 1X31 In Section @] it turns out that n = max j ^"^^ , j 

is a reasonable choice for the burn-in. Then the error bound simplifies 
to 

2 2 2 



n(l-/3i) n 2 (l-(3) 
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In many examples a good estimate for (3 can be achieved, see for in- 
stance [MR021 IBD061 IBL07] . Therefore it is straightforward to apply 
the explicit error bound. 

2. Preliminaries 

The Markov chain X l5 X 2 ,... is a stochastic process with state 
space D. We identify it with initial distribution v and transition ma- 
trix P = (p(x,y)) x ,y£D and denote it by (u, P). For x,y G D the entry 
p(x,y) presents the probability of jumping from state x to state y in 
one step of the chain. 

By Pf(x) = J2 y eDP( x 'y)f(y) we obtain the expectation of the 
value of / G M D after one step of the chain starting from x G D. 
The expectation after k steps of the Markov chain from x is given 

by P k f(x) = Y,y£DP k ( x iy)f(y)i where pk = (P k ( x ^y))x, y eD denotes 
the k-th power of P. Similarly we consider the application of P to 
a distribution v, i.e. uP(x) = J2yeD u (y)p(y^ x )- This is the distribu- 
tion which arises after one step where the initial state was chosen by 
v. The distribution which arises after k steps is given by uP k (x) = 

Y,yeD u (y)p k (y> x )- 

The expectation E^p of the Markov chain X±, . . . ,X n+no is taken 
with respect to the probability measure 

) = v(xi)p(xi,x 2 ) p(x n+no -i,x n+no ) 

on D n+n ° . Using this for i < j we obtain a characterization by the 
transition matrix 

(i) E^ifwnxj)) = j2pyp 3 - i f)( x >( x )- 

2.1. Reversibility and spectral structure. We call the Markov 
chain with transition matrix P, or simply P, reversible with respect 
to a probability measure ir if the detailed balance condition 

n(x)p(x,y) = 7t(y)p(y,x) 

holds true for x, y G D. If P is reversible, then 7r is a so called stationary 
distribution of the Markov chain, i.e. nP(x) = tt(x). Note that, if P is 
reversible then P k is also reversible. Let us define the weighted scalar- 
product 

= ^2f( x )9(x)7i(x), 

x€D 
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for functions f,g E ^ D . Then let ||/|| 2 = (f,f)l? 2 . By considering 
the scalar-product it is easy to show, that reversibility is equivalent to 
P being self-adjoint. Furthermore suppose that the underlying Markov 
chain is irreducible and aperiodic, this is also called ergodic. For details 
of these conditions we refer to the literature, for instance |Hag02 , fBre99[ 
ILPW09] . It is a well known fact that this implies the uniqueness of the 
stationary distribution. Applying the spectral theorem of self-adjoint 
stochastic matrices and ergodicity we obtain that P has real eigenvalues 



l = Po>Pl>P2>-">P\D\-l>-l 

with a basis of orthogonal eigenfunctions it, for i 6 {0, . . . , \D\ — 1}, 
i.e. 




Additionally one can see that uq(x) = 1 and S(ui) = for i > 0. 



2.2. Convergence of the chain. The speed of convergence of the 
Markov chain to stationarity is measured by the so called x 2 -contrast. 
Let v, (jl be distributions on D then 



{v(x) - fj,(x)Y 
ji{x) 



The x 2 - con trast is not symmetric and therefore no distance. For arbi- 
trary distributions it can be very large, i.e. x 2 ( z/ ) A 4 ) ^ - — 1 , where 



£ - 1 



max ie£) 



Kg) 



From [Bre99l Theorem 3.3 p. 209] we have 



(2) 



X 2 (vP k ,n)<(3 2k X 2 M 



ERROR BOUNDS OF MCMC 



where (3 = max |/?|d|-i | } denotes the second largest absolute value 
of the eigenvalues. Let us turn to another presentation of the conver- 
gence property. We have 



vP k (x) -tt(x) 



yeD 



^2 ~r~;P h ( x > y) n ( x ) ~ *( x ) 



r< v. <— - 7r(y) 

yeD viv 



yeD 



y£D 



y eD 



The second equality follows by the reversibility of the Markov chain. 
For simplicity let 



yeD 



such that altogether 
(3) ||4|| 2 : 



m v 



i 



Since (3 < 1 we have an exponential decay of the norm with k — > oo. 
We define the weighted sequence spaces for 1 < p < oo by 



l p = l p (D,ir) :=<fe 



D 



xeD 



X)\ Tlix) < OO 



It is clear that L 



since the state space has finite cardinality. 



Remark 1. As we have seen the x 2 -contrast corresponds to the /2-norm 
of the function dj-- Other tools for measuring the speed of convergence 
induce similar relations. For instance 

vP k 



\d k \ x = 2 \wP k - 7r and \\d 



1 



TT 



The total variation corresponds to the /i-norm of dk and the Zoo-norm 
to the supremum- distance. 

Remark 2. The constant f3 plays a crucial role in estimating the speed 
of convergence of the Markov chain to stationarity. In general it is not 
easy to handle (3\ or [3, but there are different auxiliary tools, e.g. 



6 DANIEL RUDOLF 

canonical path technique, conductance (see [JS89] and |DS91j ). log- 
Sobolev inequalities and path coupling. For a small survey see |Ran06| . 

2.3. Norm of the transition matrix. Let us consider P and S as 
operators acting on l p . Then the functional S maps arbitrary functions 
to constant functions. Let 

l° p := 1° p (D,tt) = {gel p : S(g) = 0} for 2 < p < oo. 

The norm of P as operator on Z° an d ^4 is essential in the analysis. 
We state and show some results which are implied by the Theorem of 
Riesz-Thorin. For a proof and an introduction we refer to [BS88J. 

Proposition 1 (Theorem of Riesz-Thorin). Let 1 < p,qi,q2 < oo. 
Further let 6 e (0, 1) and 

1 1-9 6 



and 



Then 



T:l qi ^l qi with ||T||^ 9i < Mi, 
T:l q2 ^ l q2 with \\T\\ lq ^ lq2 < M 2 . 



Note that the factor two in the last inequality comes from the fact 
that we consider real valued functions /. In the following we show a 
relation between P, P — S and (3. 

Lemma 2. Let P be a reversible transition matrix with respect to tt 
and nGN. Then 

(4) ll^ n -^||^ 2 = ||P n || M =/3 n 
Furthermore if 2 < p < oo then 

(5) ll^l/o^o < W^-Sl ^ < 2. 

Proof. The self-adjointness of P implies ||P|| ; o^p = max |/3i, | } = 

(3, such that \\P n ||jo_ z o = /?"• By 

||P"-SV^ 2 = sup \\(P«-S)f\\ 2 = sup \\P n (f - S(f))\\ 2 

ll/lla<l II/II 2 <1 

< sup sup || P n g || 2 = ||P"||jo_^o 

ll/ll 2 <l||slla<l,S(s)=0 " 2 2 
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and 

||P"|| M = sup ||P n <7|| = sup \\P n g-S(g)\\ 

P P \\g\\ p <l, S(g)=0 \\g\\ p <l, S(g)=0 

< sup \\(P n -S)f\\ p =\\P n -S\\ l ^ lp 

ll/|| P <i 

claim (jl]) and the first part of ([5]) is shown. Finally, by applying the 
triangle inequality of the norm 

\\P n ~ S\\ h _+ = sup ||P"/ - Sf\\ p < ||P»|| + ||£|| = 2. 
Il/ll p <i 

□ 

The next statement adds the result about the matrix norm which is 
used in the proof of the error bound. 

Lemma 3. Let P be a reversible transition matrix with respect to n 
and nGN. Then 

(6) lim M < 2V2/T/ 2 . 

Proof. By Lemma [2] we have 

ll^ n -SIL 2 -, 2 =/3 n and ||P n - S\\ loo _^ loo < 2. 

Then the result is an application of Proposition [Tj where T = P™ — S* 
and qi = 2, q 2 = oo, p = 4 thus = \. □ 



3. Error bounds 

In this section we mainly follow two steps to develop the error bound. 
At first a special case of method S Uyno is considered. The initial distri- 
bution is the stationary one, thus it is not necessary to do a burn-in, 
i.e. n = 0. Secondly we relate the result of the first step to the general 
case where the chain is initialized by a distribution v. The techniques 
which we will use are similar as in [Rud09j. 

3.1. Starting from stationarity. This is also called starting in equi- 
librium, i.e. the distribution of the Markov chain does not change, it 
is already balanced. In the following we will always denote S ni o as S n . 
Let us start with stating and discussing a result from [BD06, Prop. 2.1 
p.3]. 

Proposition 4. Let f £ R D . Let Xi, . . . ,X n be a reversible Markov 
chain with respect to tt, given by (P, n). Then 



s 
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\D\-1 



(7) 

where 



e n (S n ,f) 2 = ^ Yl Wk\ 2 W(n,p k ) 

k=l 



n(l - /3 fe 2 ) - 2ft(l - fl?) 



Proof. Let us consider g '■— f — S(f) G M 15 . Because of the orthogonal 
basis the presentation gf(a?) = 57jL=i 1 a kU k (x) is given. The error obeys 



e(5' n ,/) 2 = E 7r p 



3=1 



1 

n 5 



3=1 



t n 9 n ~i n 

= ^E E -^ra 2 + ^E E e.,p^(x,mx,). 

3=1 3=1 i=3+l 

For z < j, 

|D|-1|D|-1 

a k ai E ntP u k (X i )ui(X j ) 

k=l 1=1 
\D\-1\D\-1 

Y E akai ( u k, p: '~ i ui) 7r 

k=l 1=1 

[U|-l|£»|-l \D\-1 

= E E akQl Pi'" ( Mfc ' M ^ = E a k $T'> 

k=l 1=1 k=l 

where the equality of the second line is due to the fact that the initial 
step is chosen from the stationary distribution. The last two equali- 
ties follow from the orthonormality of the basis of the eigenvectors. 
Altogether we have 

l-DI-l 



(i) 



e(s n j) 2 = — y] a l 



k=l 

\D\-l 



Ti—l n 



n 



3=1 i=i+l 



~2 E a k 

n z z — ' 

k=l 



n + 2 



a -fay 



\D\-1 



-i £ W\ 2 W{n,(3 k ). 

k=l 



□ 
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Let us consider W(n, (3k) to simplify and interpret Proposition HI 

Lemma 5. For all n G N and k G {1, . . . , |D| — 1} we have 

2n 

(8) W{n,f3 k )<W{n,^)<- -. 

i — Pi 

Proof. Let x G [—1,1), then we are going to show that W(n,x) is 
monotone increasing, i.e. W(n, j3k) < W(n, pO- For i G {0, . . . , n — 1} 
it is true that 

< 1 ^ (1 - X*) I""' < 1 - X i + X* < 1 + X n . 

Therefore 

+ x m + x n - l - x + a;™- 4 < 2(1 + x n ), 

and 

71—1 71—1 

(1 + x) xl = 2 S x * + + xn ~ i_1 + xH ~ l <n(l + x n ). 

i=0 i=0 

Now 

dw = _ 2 (i + »)££,V- »(! + *■) > 

ax (1 — x) 2 

and the first inequality is shown. By 

mni) <{^ *<=[-l,0] _2n_ 

the claim is proven. □ 

An explicit formula of the error if the initial state is chosen by the 
stationary distribution is established. Let us discuss the worst case 
error of S n . 

Proposition 6. Let X\, . . . ,X n be a reversible Markov chain with re- 
spect to n, given by (P, 7r). Then 

x2 1 + A 2/^(1 -/?«) ^ 2 



(9) sup e(S n J) =— — — - < 

||/|| 2 <i n(l-A) n 2 (l-pO 71(1-/90 

Proof. The individual error of / is 

1 |Z?hl llfll 2 
e(^ n ,/) 2 = — V |a fc | 2 W(n,&) < ^ max W(n,(3 k 
<-■'• <<~ z — ^ n 2 fe=i,...,|D|-i 

l+p\ M . ||2 2p\(l-p?) 



© n 2 n(l — pi) n 2 (l — pi) 2 
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where is chosen as in Proposition H] and therefore 1 l a fc| 2 — 

|| /|| 2 - From the preceding analysis of the individual error we have an 
upper error bound. Now we consider / = u\, where obviously ||wi|| 2 = 1 
and get by applying (GO) that 

, 2 _ 1 + A 2/^(1 -ffi) 

6[ n,Ul) ~n(l-ft) n*{l-W 

Thus the error bound is attained for U\ and by (jSJ) everything is shown. 

□ 

Finally an explicit presentation for the worst case error on the class 
of bounded functions with respect to ||-|| 2 is shown. Notice, that (J9j) 
is an equality, which means that the integration error is completely 
known if we start with the stationary distribution. In some artificial 
cases this method even beats direct simulation, e.g. if all eigenvalues 
are smaller than zero or if one specific $ < and the goal is to approx- 
imate S(ui). In |FHY92, Remark 3, p. 617] the authors state a simple 
transition matrix where /3j = — n^zi f° r an i- Now one could think 
to construct a transition matrix where fix is close to —1 and therefore 
damp the integration error. But it is well known that this is not possi- 
ble for large \D\, since (3\ > — \ D }_ V 

In the next subsection we link the results to a more general frame- 
work, where the unrealistic assumption that the initial distribution is 
the stationary one is abandoned. 

3.2. Starting from somewhere else. In the next statement a rela- 
tion between the error of starting by ir and the error of starting not by 
the invariant distribution is established. 

Proposition 7. Let f e R D and g := / — S(f). Let X\, . . . , X n+no be 
a reversible Markov chain with respect to tt, given by (P, v). Then 

(10) 

1 n _ n— 1 n 

e u (S nino J) 2 = e n (S n J) 2 + - 2 J2L 3+no (g 2 ) + - 2 Ys E ^(gP^i 

j=l j=l k=j+l 

where 

Li{h) = di(x)h(x)ir(x) = J2J2 ^T)^ ^ ~ *(v))K*Wz)- 

Remark 3. The proof of this identity is similar as in |Rud09j . except 
for the fact that we study a discrete state space and therefore integrals 
become sums. 
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Proof. It is easy to see, that 

n n 

E„ lP \s(f) - s n , no (f)\ 2 = -J2J2 E ^(9(x no+j )g(x no+i )) 
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3=1 i=\ 



n— 1 n 



1 

= ^ E E ^ no+ V(*) K*) + ^ E E E ^fo^x*) k*) 

j=l zG-D j=l fc=j+l xeD 

For every function ft. G M 15 and i G N under applying the reversibility 
the following transformation holds true 

xeD xeD yeD ^ ' 



EE^ l >^)^)^) 



itt. * — ' * — ' 7r(v) 
xeDyeD Vy/ 



= £ h(x) n( X ) + E E ^ y) ~ n(y ^ Kx) n{x) 

xeD xeDyeD 

= £(^)(*m*) + E E ^4 y) " ^ h{x) <x 

r6V ' xG-D xeD yeD ^ ' 

Using this in the setting above, formula ffTUj) is shown. 

Equation (TTUT) is still an error characterization where equality holds. 
We will estimate Lk(h) to derive an upper bound. This depends very 
much on the speed of convergence from the chain to stationarity. 

Lemma 8. Let h G M. D , let again (3 = max|/3i, /3|£>|-i|}. Then 



□ 



11^ 



:i2) 



\L k (h)\<{3 k 
\L k (h)\<(3 k 











7T 


oo 



11 • 











7T 


oo 



Proof. Let us consider L k {h) = (d k , h) w . After applying Cauchy-Schwarz 
inequality we obtain 

\L k (h)\ < \\d k \\ 2 \\h\\ 2 . 



l 7T 1 1 OO II ^11 1 
□ 



By applying (j3J) we showed (|T2|) . Inequality (|T2j) and \\h\\ 2 < 
implies ffTTl) . 

The ingredients for getting an explicit error bound for S n>no are 
gathered together. Mainly the last Lemma ensures an exponential decay 
of L k (h) which is used in the next Proposition. 
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Proposition 9. Let X\, . . . ,X n+no be a reversible Markov chain with 
respect to tt, given by (P, v). Let f e M. D , g :— f — S(f) and 

n n— 1 n 

j=l j=l k=j+l 

3=1 3=1 fe=3+l 

(%) T/ien /or g $zl\ we have 



£u(S n ,n , f) < e 7r (S' n , /) + /3 



?"0 



- 1 



7T 



(mJ T/ien /or g we have 

£v{S n ,n , f) 2 < e 7r (S' n , /) 2 H - /3 



7T 



fmj T/ien for g & we have 



£i/{Sn,n , f) ^e 7r (5' n ,/) + /3 



7T 



2 

oo 



Proof. As we have seen in ( II Oft the error obeys 
(13) 

e,(5 n , no ,/) 2 = e w (5 n ,/) 2 + — £L i+no (a 2 ) + — £ E L j+no {gP k -ig). 



n—l n 



3=1 



3=1 fc=3'+l 



Then by ( TTTI) . Cauchy-Schwarz inequality and \\P k ■ J '|| = /J* 1 J we 
get 



^•+n (^ 2 )| < 



L J+no ( P fc -^)| < 



7/ 



7T 



7/ 



7T 



j+no || „ || 2 
2 ' 



fc+no IUI|2 _ 
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Putting this in the sums of equation (113]) and let Eq 
we obtain 



- 

7T I I OO 



--1 I3 m 

I 7T II OO' 



n n—1 n 

J2\L 1+no (g 2 )\+2j2 E \L J+no (gP k ' J 9) 

j=l j=l k=j+l 

n n—1 n 

<e \\g\\lY,(3 j +e Q \\gf 2 E £ 2/3 fc 
j=i j=i fc=j+i 

tn n—1 n 

E^'+E E 2 ^ fc ) = r( '■">•-» 
i=i i=i k=j+i 



Thus claim (jl]) is shown. Now we use ( TT2j) and 



^ll 2 <ll^llLII^" J ll«<ikiiL^ j 



to obtain 



\L j+no (g 2 )\ < 



L J+no (gP k ->g)\ < 





V 


- 1 






71 




OO 




V 


- 1 






71 




OO 



j+n || n || 2 

oo ' 



k+n || ^ || 2 
oo 



Exactly the same steps as in the proof of §\§ follow, except for a different 
e = \J\\^ — m^P 710 an d the supremum norm, i.e. assertion (Jul]) is 
proven. Let us turn to (Jn|). Again we use (fT2l) and estimate 



bP'-'gl < 



\p h ~ j g\\ A ^ \\p k ~ j \ 



;0_^/0 ||y|l4 
4 4 @ 



<2V2 



2 n^- 

2 



Thus 



£j+no(# ) < 



L j+no (gP k -ig)\ <2^2 









<- 


v --l 






71 


oo 



j+no || r , || 2 
4 ' 









/ 


7T 


oo 







k+j _ 



-no 
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For Eq 



- - 1 3 n ° we obtain 

I 7T II OO' 

n— 1 n 



fe+i 



^ 1^(^)1 +4V^X) E |^ + n (^'^)| 
i=l j=l fc=j+l 

n n— 1 n 

<e \\g\\lJ2P j +eo\\gtYl E 4 ^ 

j = l jr' = l fc=J + l 

/ n n— 1 n \ 



Finally by substituting this in equation (TT3T) everything is shown. □ 

In the last Proposition we introduced V({3, n) and U(/3,n). These 
functions are bounded if 3 < 1. By applying the infinite geometric 
series several times the following is proven. 

Lemma 10. For n E N and x E [0, 1) we have 

2 „, , Ay/2 



(14) V(x,n)< 



U(x, n) < 



(1-X) 2 ' V ' ' ~ (1- X )(1- y/x)' 

This implies that the asymptotic optimality is reached. 

3.3. Main Theorem. The following is the main result. 

Theorem 11. Let Xi, . . . , X n+no be a reversible Markov chain with 
respect to tc, given by (P, v). Let f G M. D and = (f,Uk) n - 
Then 



\D\-1 



lim n ■ e v (S n>no , f) 2 = lim n ■ e n (S n , ff = V" 

n — ► nn n. — >nn < * 



fc=i 



l + Pk 
1-Pk 



(i) If we consider f G 1% then 



e u{S njno , f) 2 < 



n(l-ft) IIJ " 2 
(ii) If we consider f E U then 



e u(S n>no , f) 2 < 



+ 



n 2 (l -3) 



- 1 3 n ° 

oo' I, „ 1 1 2 
2 



n(l-A) 



16^/ £ - 1 

/i ~r 



(1 -/5)(1 - V^) 



4 ' 
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(Hi) If we consider f E loo then 



e„(S n>no , ff < — ~ a A \f\\l + 



A J v --\ (3 



00 11 j-ii2 



n(l-A) n 2 (l-/3) 



2 \\J Hoo 



Proof. By (11 01) and the fact that the remaining terms are going qua- 
dratic to zero as n goes to infinity, we see that the asymptotic result 
holds true. For / E I2 we have ||/ — S(f)\\ 2 < \\f\\ 2 and furthermore if 
p ytz 2 then 

\\f-S(f)\\ p <\\f\\ p +\S(f)\<\\f\\ p +\\f\\ 1 <2\\f\\ p . 

Thus, via Proposition [9j Proposition [6] and Lemma [10] everything is 
shown. □ 

Notice, that from the estimate of Proposition [9] it follows immedi- 
ately that 



lim n ■ e u (S n , no , f) 2 < lim n ■ e 7r (S , „ ino , f) 2 < 



"^f 11/11; 



n 

Thus there is no gap between the estimate and the asymptotical be- 
havior. Also notice, that the upper bounds are continuous in the sense 
that if the initial distribution v is ir then we obtain the bound of Propo- 
sition [6l The dependence of the bounds of (J11]) and flm|) in Theorem [11] 
on the initial distribution is encouraging for an extension to general 
state spaces. (For an introduction to MCMC on general state spaces 
we refer to |RR04] .) But the dependence of the initial distribution on 
the estimate in the ^-case is disillusioning because of the additional 
factor of 1 1 - 
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In |Rud09] Theorem 8, p. 10] a similar Zoo-bound of S ntno for general 
state spaces is developed. This result holds for lazy, reversible Markov 
chains and may also be applied in the present setting, i.e. if the state 
space is discrete. In [Rud09j the asymptotic error limit is not attained. 
Thus we could improve the error bound and weaken the laziness condi- 
tion, i.e. it is enough that fa = (3. In |LPW09l Thm. 12.19, p. 165] the 
authors obtained for another error term a comparable bound where the 
chain starts deterministically. 

4. Burn-in 

Let us assume that computer resources for the MCMC method for 
TV time steps are available, i.e. N = n+no. We want to choose the burn- 
in no and the number of n such that the error is as small as possible. 
The burn-in n should be large but this implies that n is possibly quite 
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small depending on how much resources we have. On the other hand 
n should be large which again implies that n is possibly small. There 
is obviously a trade-off between choosing the parameters. In the next 
statement we consider the error for an explicitly given burn-in, where 
for simplicity f3\ = (3. 

Corollary 12. Let f e M D be given and let 

log(C) 



n = max 







%) Let C = J\m\ J\\z - ill , then 



sup e v (S nno , f) 2 < — — — H — —rz. 
|/|| 2 <i n(l-P) n 2 {l-(3) 2 



(ii) Let C = 16^^^ - 1^, then 



2 1 
sup e v (S n>no , ff < — -r + 



<i v ^w J> ~n(l-P) n 2 (l-/3)(l-^)' 



(Hi) Let C = 2J\\z - 1 , then 



2 2 

sup e u (S nno ,f) < — — — - H r— - 
fii <i n(l-p) n 2 (l-fj)< 



Note, that in the l^- and /2-case the error bound is the same. Just the 
constant C which comes in by the density is different. This suggestion 
of the burn-in is justified in the following. 

4.1. Numerical experiments. Suppose C (very large), (3 (close to 
one) and resources N are given. The worst case error for ||/|| 2 < 1 or 
, < 1 is bounded by 



2C7/3"o 

Ooo{n,n ) := W— — + 



n{\ -13) n(l-(3) 2 
and if we consider ||/|| 4 < 1 it is bounded by 



Cf3 n ° 

b 4 {n,n ) := J — - — + 



n{l-(3) n (l - (3)(1 ~ VP)' 

Since N = n + no we can compute with a numerical procedure (here 
using Maple) the optimal choice of the burn-in denoted by n* pt , n^ t 
to minimize the upper error bounds. (This is a simple one dimensional 
minimization problem with different parameters.) 
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N 


[3 


4 

n opt 


oo 
""opt 


n = [-log(C)/ logOS" 1 )! 






(by Maple) 


(by Maple) 


(suggested above) 


10 4 


0.9 


656 


656 


656 


10 5 


0.9 


656 


656 


656 


10 4 


0.99 


6867 


6867 


6873 


10 5 


0.99 


6873 


6873 


6873 


10 4 


0.999 


8001 


8001 


69043 


10 5 


0.999 


68977 


68977 


69043 



Table 1. For C = 10 30 where n* pt minimizes 

M^-^optXpt), 2 = 4 >°°- 



Table [T] gives a collection of typical results. It turned out that the 
above suggested lower bound is almost the optimal choice. The com- 
puted value n 4 pt and n™ t is almost the same as n = |~log(C) / log(/9 -1 )] . 
In the case C = 10 30 and (3 = 0.999 Theorem [TT] gives for no choice of 
n and no with iV = 10 4 an error smaller than one. 

In Figure[T]we plotted b^N—riQ, n ) for different n and e n (SN, U\) = 

\J Mi-fa) ~ ^(l- ftp' R- ou ghly spoken one may see in Figure [1] that if 
the burn-in is chosen too small a vertical shifting takes place and if 
the burn-in is chosen to large a horizontal shifting takes place. The 
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— n 
- n 
--n 



l°g(C) 
log(/3-i) 

i°g(p j 

o log(C) 
Mog(/3-i) 



0, init by 7r 




N = n + n 

Figure 1. For (3 = 0.99 and D = 10 ; 
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asymptotic behavior is the same, i.e. for the long run the error of S Ujno 
converges to the error of S n . If /3 and C are given we chose the burn- 
in as suggested above. If there is an estimate of log(C)/ log(/3 _1 ) one 
should ensure that it is not smaller than the real ratio. As seen in 
Figure [1] if it is slightly smaller there is already strong influence. By 
choosing the burn-in too large the influence is less heavy. 

Finally if there is no estimation or computation of the parameters 
(3 or C a simple but very efficient strategy is given by choosing n = 
n = y (for even N). In Figure [2] we see &4(y, y), b±(N — n Q ,n ) and 
Gtt(Sn,Ui). In the asymptotic behavior we pay the price of a factor 



10 



-no 



log(g) 
log(/3-i) 

2 ' " 2 



n 

no = 0, init by n 




10 



10 



N = riQ + n 



10 



Figure 2. For (3 = 0.99 and D = 10 



:-!() 



of y/2, i.e. the asymptotic error is v^2 times larger than e n (SN,ui) 
where we started in equilibrium. This strategy works well and reaches 
the same convergence rate as choosing the burn-in as suggested above, 
which is seen in Figure El 
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